Preprocessing
The preprocessing script (R/001 - data processing) is designed to
meticulously prepare election data spanning three different years—2006,
2011, and 2018—for subsequent analysis. Given the diverse nature of the
data sources and the specific requirements of electoral studies, this
section employs a series of detailed steps to ensure data consistency,
accuracy, and usability. The primary focus is on handling discrepancies
across datasets, normalizing data structures, and creating a unified
framework that supports robust and comprehensive analysis.
The data preprocessing involves several key steps:
- Data Reading and Cleaning: This step includes
importing data from various Excel files, removing irrelevant rows and
columns, and merging datasets where necessary. Specific adjustments are
made to handle unique identifiers and ensure alignment across different
datasets.
- Column Renaming and Nesting: The data is
reorganized to nest information about candidates and votes in a
structured format. This involves standardizing column names for
consistency and clarity across the datasets from different years.
- Geographic Organization and Indexing: Election data
is indexed and nested by geographic and administrative divisions, such
as provinces, cities, and constituencies. This step addresses
inconsistencies in geographic names and ensures that data is uniformly
formatted for easier comparison.
- Data Matching Across Elections: A multi-step
process is implemented to match geographic names across different years,
accommodating variations in naming conventions and ensuring that data
can be accurately compared.
- Data Merging: Utilizing the established indexes,
datasets from different years are merged to create a comprehensive view
that supports longitudinal analysis.
- Derived Quantities Calculation: This involves
estimating and aggregating registered voters, total votes, and specific
candidate votes across various administrative levels, enhancing the
dataset’s analytical depth.
- Geographic Harmonization: Geographic data is
aligned with the electoral dataset, ensuring consistency in spatial
analysis.
- Conflict Data Integration: Conflict data (UCDP and
ACLED) is integrated to provide a contextual understanding of the
election periods, categorized by types of violence and involved
parties.
- Nightlight Data Analysis: Nightlight data is
processed to analyze trends over time, providing additional context for
the electoral analysis.
Each of these steps is critical in transforming raw election data
into a structured and analyzable format, facilitating a thorough and
nuanced understanding of electoral trends and patterns.
1-READ DATA
This section processes and combines election data from different
years to prepare it for further analysis. Specifically, it handles data
from the years 2006, 2011, and 2018, sourced from multiple Excel
files.
2006 Data: The script reads two sets of data for
2006. The first dataset (data/2006first_round.xlsx) is
filtered to remove rows representing provincial totals and related
columns are excluded. The second dataset
(data/2006clean.xlsx), labeled as second round, undergoes a
similar cleaning process. These datasets are then merged based on the
Territoire/ville column.
2011 Data: Data from the 2011 elections
(data/2011drc_election_all_clcr_cleaned_stata.xlsx) is read
and rows containing aggregated totals are excluded.
2018 Data: The script first reads the primary
dataset for 2018 (data/prs_edited.xlsx). It then reads a
separate file containing candidate names and IDs
(data/RESULTAT-PRESIDENTIEL-1.xlsx), which are necessary to
match candidate details correctly across different datasets. Due to
discrepancies in candidate IDs between different sources, the script
includes a manual mapping of IDs
("1001_44_10"="20","1001_48_14"="4","1001_84_158"="13") to
align them correctly based on a comparison with the original data
source. Once the IDs are aligned, the candidate details are merged into
the main 2018 dataset.
2-RENAME COLUMNS AND NEST VOTES FOR EACH CANDIDATE
This section processes election data from 2006, 2011, and 2018 to
structure and organize key information regarding electoral votes and
participants. Specifically, it performs the following operations:
For the year 2006, it restructures the data to list each
candidate along with the percentage and calculated number of votes they
received. This transformation involves pivoting the dataset so that
candidate names and their corresponding vote percentages are collated
into a nested data frame, which includes both the candidate names and
the calculated votes based on valid votes and participation
percentages.
For the years 2011 and 2018, the script formats the data by
nesting details about each candidate’s votes into a similar structured
format. This includes renaming certain columns for consistency and
clarity, such as the candidate’s name and the number of votes they
received.
Additionally, the script standardizes other important electoral
information across the datasets for these years, such as the number of
registered voters, actual voters, ballot boxes, and the count of
processed ballot boxes. This renaming ensures uniformity across
different election years for easier comparison and analysis.
Overall, this section enhances the accessibility and usability of
election data by organizing it into a consistent format across different
election years, allowing for streamlined analysis and reporting.
3-NEST DATA BY LOCATION AND STANDARDIZE INDEXES
This section focuses on organizing and indexing election data from
2006, 2011, and 2018 by geographic and administrative divisions such as
provinces, cities, and electoral constituencies. It aims to facilitate
easier data comparison and analysis across different election years:
Nesting and Index Creation: The script nests
voting site data within each constituency for the 2018 dataset. For all
years, it establishes indexes based on geographic identifiers like city
names and constituency names, which are standardized to lower case and
stripped of extra spaces for uniformity.
Labeling and Manual Adjustments: The script also
assigns labels to each entry for clearer identification and resolves
inconsistencies in geographic names between datasets. In Kinshasa, for
example, it adjusts city names to subprovince levels to match data
granularity in other years.
Data Matching and Structuring: Efforts are made
to align the datasets by manually correcting discrepancies in geographic
names across the election years. This includes both simplifying and
matching names and ensuring that the names used reflect the
administrative changes or differences noted in different
datasets.
Nested Structuring: Finally, the data is
restructured into nested formats based on updated indexes and labels.
This allows for detailed yet manageable data subsets, which can be used
for in-depth regional analysis or aggregated to provide broader
electoral insights.
This meticulous organization of data by location enhances the
analytical framework, making it easier to track electoral trends and
patterns across different regions and election cycles.
4-MATCH DATA ACROSS ELECTIONS
This section is dedicated to matching election data across three
different election years: 2006, 2011, and 2018. The primary challenge
addressed here is the discrepancies in geographical names across
different datasets, particularly because some regions like Kinshasa lack
detailed geographic identifiers in some years. To tackle this, a
multi-step matching strategy is implemented:
Initial Extraction and Naming: The script
extracts names of cities or constituencies (referred to as
villes/circonscriptions) from each year’s dataset and assigns them as
names to character vectors. This approach uses the names of these
vectors for matching, ensuring that the actual content remains
unchanged.
Iterative Matching Process: The matching process
involves several iterative steps where the names in the vectors are
slightly modified in each iteration to accommodate differences in naming
conventions between datasets. For example, the term “ville” is removed
and excess whitespace is cleaned to improve matching accuracy.
Compilation of Matched and Unmatched Names:
After each matching attempt, matched names are compiled, and unmatched
names undergo further processing to refine their format and attempt
another match. This stepwise refinement continues until no further
matches can be found.
Final Data Structuring: The results are then
structured into a comprehensive list that captures both matched and
unmatched names, ensuring that data from different years can be compared
accurately despite initial discrepancies.
The process is meticulous and aims to ensure that electoral data from
different years can be aligned and analyzed consistently, addressing
challenges posed by changes in geographic names and administrative
boundaries over time.
Details: There are some missing Circonscriptions in 2011. At
Kinshasa, because we don’t have enough detail in 2011 data, we are
limited to use the subprovince level. The strategy followed is to
extract the names of the villes/circonscriptions into a named character
vector. Matching is done in several steps because there are differences
across elections in the way the villes are named. First, the names of
the vector are set equal to the vector contents. We will use the names
and not the contents to match the villes across elections. In each step
trying to match the data, we modify the names of the vector, leaving the
vector contents untouched.That way, in the end, we have an index of
villes matches across elections, even if the actual names are
different.
5-MERGE DATA ACROSS ELECTIONS
This section consolidates election data from different years (2006,
2011, and 2018) using previously created matching indexes to ensure
consistency and continuity across datasets. The primary activities in
this process include:
Data Preparation: Indexes that have been aligned
in previous steps are used to merge data across the election years. This
ensures that each entry from different years corresponds to the same
geographic location or administrative division, even when direct matches
in names are not apparent.
Merging Process: The script performs several
full joins to combine the datasets from 2006, 2011, and 2018 based on
these indexes. This method allows for the inclusion of all available
data, whether or not a direct match exists in all three years, thus
preserving the maximum possible data granularity.
Final Structuring and Cleaning: Once merged, the
data is restructured to create a unified view that includes labels and
province names standardized across years. This structuring is crucial
for analyses that require consistent geographic identifiers across
multiple election cycles.
Variable Cleanup and Data Saving: Unnecessary
variables are removed to tidy up the workspace, and the final structured
dataset is arranged and saved for further analysis or
reporting.
By integrating data from multiple election years, this section
facilitates comprehensive longitudinal electoral analyses, helping to
identify trends and changes over time within the same geographic
locales.
6-DERIVED QUANTITIES
6.1-REGISTERED VOTERS IN EACH LEVEL
This section focuses on calculating the number of registered voters
at various administrative levels for each election year. The
calculations are performed through a series of nested data
manipulations, leveraging the flexibility of functional programming
within R:
Data Transformation: For each election dataset,
the script navigates through multiple nested structures—ranging from
regions down to individual voting sites—to calculate the number of
registered voters based on the available data, such as valid votes and
participation percentages.
Voter Estimation (2006 only): Using the data on
valid votes and the percentage of participation, the script estimates
the total number of registered voters at different levels (e.g.,
circonscription, ville territoire) by reversing the calculation of valid
votes from participation rates.
Aggregation: After obtaining the registered
voters at the lowest levels, these figures are summed up through the
nested structures to provide total counts at higher administrative
levels, ensuring that each region’s total reflects all underlying
data.
Data Integration: The computed totals of
registered voters for each level and year are then integrated back into
the main dataset, providing enriched data points that support more
detailed electoral analysis.
By systematically estimating and aggregating registered voter counts
across different levels, this section enhances the dataset’s value for
analyzing voter turnout and electoral engagement across regions and
election cycles.
6.2-VOTERS IN EACH LEVEL
This section is devoted to calculating the total number of voters
from election data spanning three different years: 2006, 2011, and 2018.
Here’s an overview of how the data is processed:
Data Transformation: For each election year, the
data undergoes a series of transformations to calculate the total voters
within each administrative level, such as circonscriptions and voting
sites.
Nested Calculations: The script navigates
through nested structures (circonscription to voting sites) to aggregate
voters at various levels.
Summation and Aggregation: After extracting the
voters for smaller units, the script sums these voters to compute total
figures for larger geographic or administrative areas. This aggregation
helps in understanding the total voter turnout and the distribution
across different regions.
Data Integration and Unnesting: The sums of
voters are then integrated back into the main dataset, ensuring that
each administrative unit’s total voters are reflected in the final
dataset.
6.3-TOTAL VOTES IN EACH LEVEL
This section of the script is dedicated to calculating the total
number of valid votes (referred to as “total votes”) across different
administrative levels for the election years 2006, 2011, and 2018.
Here’s a concise breakdown of the process:
Vote Calculation:
- For 2006, the script directly assigns the number of valid votes
(‘Votes valables’) from the data to a new column called ‘total.votes’
for each circonscription. It then sums these votes to get a total count
for each higher administrative level.
- In 2011, the process involves extracting and summing votes from
nested data structures within each circonscription. This is slightly
more complex as it involves pulling and summing nested vote counts.
- For 2018, the calculation is even more granular, extending down to
voting sites within each ville and territoire. This involves multiple
layers of mapping and summing to aggregate votes all the way up from the
most detailed levels.
Data Aggregation: After calculating the total
votes at the lowest necessary levels, the script aggregates these totals
to provide comprehensive vote counts for larger geographic or
administrative areas.
Integration and Unnesting: The aggregated total
votes are then integrated back into the main dataset.
6.4-VOTES AND PERCENTAGE FOR KABILA IN EACH LEVEL
This section of the script is dedicated to calculating the total
votes and the percentage of votes received by the candidate Kabila (and
Ramazani in 2018) at various administrative levels across the election
years 2006, 2011, and 2018. The method involves several steps:
- Data Extraction:
- The votes specifically for Kabila (or Ramazani in 2018) are
extracted from nested data structures by filtering for the candidate’s
name within each voting site or circonscription.
- Vote Calculation:
- The valid votes for Kabila are aggregated first within the smallest
units (voting sites or circonscriptions) and then summed up to provide
totals for larger administrative areas.
- Percentage Calculation:
- The percentage of votes received by Kabila or Ramazani is calculated
by dividing their total votes by the total votes of all candidates at
each administrative level.
- The calculation is done post-aggregation to ensure it reflects the
comprehensive vote share.
- Data Aggregation and Unnesting:
- After calculating the total votes and percentages, these metrics are
nested back into the main dataset. Then nested lists are simplified into
standard columns for ease of analysis.
- Result Integration:
- The final step involves integrating the calculated votes and
percentages into the main dataset, enhancing the dataset with key
electoral metrics for Kabila and Ramazani.
6.5-BALLOT BOXES IN EACH LEVEL (2011,2018 ONLY)
This section of the script processes and calculates the total number
of ballot boxes counted at various administrative levels during the 2011
and 2018 elections. The approach involves multiple steps to ensure
accurate aggregation of data:
- Mapping and Summation:
- For each entry in the dataset, the script calculates the sum of
ballot boxes counted within each administrative unit, such as
circonscriptions or voting sites.
- This involves iterating over each circonscription and, for the 2018
data, each ville.territoire within the circonscriptions, to aggregate
the counts from the smallest units upwards.
- Data Aggregation:
- The aggregated counts of ballot boxes from the smaller units (voting
sites or circonscriptions) are further summed to provide totals for
larger areas.
- Flattening and Integration:
- The nested results are flattened to simplify the nested lists into a
standard column format within the main dataset.
- Final Aggregation and Calculation:
- After processing individual entries, a final summation is performed
across all entries for each election year to calculate the total number
of ballot boxes counted across all circonscriptions. - The results are
added to the dataframe, providing a clear overview of ballot box
distribution and availability during the elections.
6.6-VOTING SITES WITH ZERO VOTERS (2018 ONLY)
This section identifies and quantifies voting sites with zero voters
from the 2018 election data. It performs the following operations:
- Filtering Voting Sites: The script navigates
through multiple nested data structures, filtering out voting sites
where the number of voters is not available.
- Counting Voting Sites: For each administrative
level (e.g., circonscription, ville.territoire), it counts the number of
voting sites that reported zero voters.
- Aggregating Data: These counts are aggregated at
higher administrative levels, providing a comprehensive total of voting
sites with zero voters for each level.
- Data Integration: The results are integrated back
into the main dataset, adding the total counts of zero-voter sites for
further analysis and reporting.
6.7-VOTES AND PERCENTAGES FOR OTHER CANDIDATES IN 2018
This section calculates the total votes and percentages for the
candidates Fayulu and Tshisekedi in the 2018 election. The script
performs the following tasks:
Fayulu’s Votes and Percentages: - For each
administrative level (circonscription, ville.territoire, voting sites),
the script identifies and sums the votes cast for Fayulu. - It then
calculates the percentage of votes Fayulu received out of the total
votes at each level. - These results are aggregated and integrated back
into the main dataset, providing Fayulu’s total votes and percentage for
2018.
Tshisekedi’s Votes and Percentages:
- Similar to the process for Fayulu, the script identifies and sums
the votes cast for Tshisekedi at each administrative level.
- It calculates Tshisekedi’s percentage of total votes at each
level.
- The aggregated results are then integrated back into the main
dataset, detailing Tshisekedi’s total votes and percentage for
2018.
6.8-TURNOUT
This section calculates voter turnout rates for the election years
2006, 2011, and 2018. The script performs the following tasks:
- Calculate Turnout Rates: It computes the turnout
rate for each year by dividing the number of voters by the number of
registered voters for 2006, 2011, and 2018.
7-HARMONIZE MAP AND DATA LOCATIONS
This section aligns geographic data with the electoral dataset to
ensure consistency in analysis. It performs the following tasks:
- Read Detailed Shapefiles: Loads detailed shapefiles
for Congo’s territories to get precise geographic boundaries. Please see
data/Les territoires de Congo (territories of Congo)/CONTENTS.html for
more details about the boundaries used.
- Filter and Standardize Kinshasa Regions: Filters
the shapefile data to include only Kinshasa regions and standardizes
names by replacing hyphens with spaces.
- Update and Summarize Geometry Data: Updates the
Kinshasa shapefile data with standardized names, selects relevant
columns, groups by names, and summarizes geometries.
- Read Main Shapefiles: Loads the main shapefile for
Congo’s territories (data/cod_adm2_un/cod_adm2_un.shp) and creates an
index by standardizing territory names.
- Exclude Kinshasa: Excludes Kinshasa from the main
shapefile data.
- Create Indices for Matching: Extracts and
standardizes names from both the map and data for matching
purposes.
- Manual Name Matching: Applies manual name
corrections for regions and cities to ensure consistency between map and
data.
- Match Indices: Matches data names with map names
and identifies unmatched names.
- Create Index DataFrame: Creates a DataFrame to link
data and map indices, ensuring unique matches.
- Join and Summarize Map Data: Joins the map data
with matched indices, updates index names, and groups by index to
summarize geometries.
- Transform Projections: Ensures the geographic
projections of Kinshasa borders match the main dataset.
- Combine Borders: Combines Kinshasa borders with the
main territory borders.
This section harmonizes geographic and electoral data, facilitating
accurate mapping and spatial analysis of election results.
8-CONFLICT DATA
This section integrates conflict data with geographic boundaries for
detailed spatial analysis. It performs the following tasks:
Load Conflict Data: Loads the UCDP Georeferenced
Event Dataset (GED) for organized violence events (see below for more
information on the data used).
Set Map Projection: Obtains the coordinate
reference system (CRS) from the Congo territory borders to ensure
consistency in spatial analysis.
Georeference Conflict Data: Converts the
conflict data into a spatial format using longitude and latitude
coordinates.
Filter Conflicts in DRC: Filters the conflict
events to include only those within the geographic boundaries of the
Democratic Republic of Congo (DRC).
Categorize Types of Violence: Classifies the
conflict events into categories based on the type of violence and the
involved parties.
- Non-state vs non-state: type_of_violence is 2 and
side_b is not “Civilians”
- State vs non-state: type_of_violence is 1 and
side_b is not “Civilians”
- State vs civilians: type_of_violence is 3 and
side_a contains “Government” and side_b is “Civilians”
- Non-state vs civilians: type_of_violence is 3 and
side_a does not contain “Government”, and side_b is “Civilians”.
By integrating and categorizing conflict data within the geographic
context of the DRC, this section enhances the dataset’s ability to
support spatial analysis of organized violence events.
UCDP Georeferenced Event Dataset (GED) Global version 20.1
This dataset is UCDP’s most disaggregated dataset, covering
individual events of organized violence (phenomena of lethal violence
occurring at a given time and place). These events are sufficiently
fine-grained to be geo-coded down to the level of individual villages,
with temporal durations disaggregated to single, individual days.
Available as:
CSV EXCEL RDATA STATA CODEBOOK
Please cite:
• Pettersson, Therese & Magnus Öberg (2020) Organized violence,
1989-2019. Journal of Peace Research 57(4).
• Sundberg, Ralph and Erik Melander (2013) Introducing the UCDP
Georeferenced Event Dataset. Journal of Peace Research 50(4).
Downloaded from https://ucdp.uu.se/downloads/index.htmlged_global on
2021/01/30
8.1 - AGGREGATE CONFLICT DATA FOR EACH ELECTION AND MAP REGION
This section aggregates conflict data for each election period and
map region, focusing on conflicts from the years preceding each
election. The script performs the following tasks:
Assign Election Periods: Categorizes conflict
events into election periods based on their dates.
- 2006: from 2001-01-17 to 2006-07-30
- 2011: from 2006-07-31 to 2011-11-28
- 2018: from 2011-11-29 to 2018-12-30
Filter Relevant Conflicts: Retains only conflict
events that fall within the specified election periods.
Group and Nest Data: Groups conflicts by data
area and election period, nesting the data within each group for further
analysis.
Summarize Conflict Data: Counts the number of
conflicts and related deaths for each election period.
Aggregate by Conflict Type: Groups and counts
conflicts by type of violence, further categorizing them based on
involved parties. side_a is modified to aggregate non DRC State
participants as “Foreign” and DRC State participants as “DRC.”
By organizing and summarizing conflict data in this manner, this
section facilitates detailed analysis of conflict patterns in relation
to election periods and geographic regions.
8.2 - RECODING CONFLICT ACTORS
This section processes conflict data to categorize and summarize
conflicts by actor types, ensuring all actors are accounted for and
correctly classified. The steps include:
- Read Actor Types: Loads data on conflict actors and
their classifications from an Excel file (data/DRC armed groups in UCDP
dataset Rwanda and Uganda.xlsx).
- Filter Conflict Data: Ensures that the conflict
dataset only includes actors present in the actor types list.
- Add Actor Type Information: Merges actor type
classifications into the conflict data.
- Summarize Conflicts by Actor Type 1 and 2:
Aggregates conflicts and death counts by actor type and election period,
ensuring data consistency and completeness. For actor type 2, data is
also summarized by territory.
- Merge Reversed Actor Types: Combines conflict data
where actor roles (side_a and side_b) are reversed.
- Generate Actor Type Combinations: Creates all
possible combinations of actor types for comprehensive analysis.
- Calculate Total Conflicts and Deaths: Aggregates
total counts across all years and ensures consistency with the original
data.
- Export Results: Summarizes the data and exports it
to an Excel file for further analysis (results/Conflict by actor table
type 1.xlsx, results/Conflict by actor table type 2.xlsx, ).
This process standardizes conflict actor data and provides a
structured summary for analyzing the impact of different actor types in
conflicts.
9 - NIGHTLIGHT DATA
9.1. READ DATA
Raw nightlight data was obtained from Li, X., Zhou, Y., Zhao, M. et
al. A harmonized global nighttime light dataset 1992–2018. Sci Data 7,
168 (2020). https://doi.org/10.1038/s41597-020-0510-y
Raw globe-wide nightlight data
(Harmonized_DN_NTL_[year]calDMSP.tif and
Harmonized_DN_NTL[year]_simVIIRS.tif files) is not redistributed
with this paper, but can be downloaded from the cited paper above at https://doi.org/10.6084/m9.figshare.9828827.v2. Then the
DRC data can be extracted for the 2001-2008 period using the following
code.
The DRC extracted data is provided as an RData file, which is a 175
Mb file stored using Git Large File Storage on GitHub. Extra steps may
be required for retrieval, please check the Large File Storage Support
section on this README file.
9.2. MEAN
This section calculates the mean nightlight values and processes the
data to analyze trends. Specifically, it performs the following
steps:
Calculate Mean Nightlight Values: For each
geographic index and year, it calculates the mean nightlight intensity,
ignoring missing values. This helps in understanding the average
nightlight levels over time.
Threshold Adjustment for Urban Areas: It creates
a modified dataset where nightlight values less than 30 are set to zero.
This threshold is suggested by Li et al. (2020) to identify urban
areas.
Calculate Mean for Adjusted Data: Similar to
step 1, it calculates the mean nightlight values for the adjusted
dataset to analyze trends in significant nightlight
intensities.
9.3. TRENDS
This section calculates trends in nightlight data over specific
periods using linear regression models. It separates the data into two
eras: the DMSP era (up to 2011) and the VIIRS era (from 2014 onwards).
For each era, it fits separate linear models to determine the trends in
nightlight intensity. The calculated trends are then combined into a
single dataset, providing insights into changes in nightlight intensity
over time. Similar calculations are performed for both the original
nightlight data and the modified dataset where nightlight values below
30 are set to zero.
10-ACLED data
This section processes the ACLED conflict event data for the
Democratic Republic of Congo from 2001 to 2018. It performs several key
tasks:
- Data Loading and Filtering: The ACLED data is
loaded from
data/ACLED data/1900-01-01-2022-12-20-Democratic_Republic_of_Congo.csv,
dates are converted to a standard format, and events within the
specified date range are retained.
- Categorizing Time Periods: Events are categorized
into three time periods: 2001-2006, 2006-2011, and 2011-2018.
- Standardizing Location Names: The admin2 column is
cleaned and standardized to create a consistent index for geographic
locations.
- Assigning Actor Types: Actor type data is read from
data/DRC armed groups in ACLED dataset.xlsx and matched
with event actors to categorize them into predefined groups.
- Summarizing Events: The data is filtered for
specific event types and summarized to calculate the number of deaths
and conflicts for each period, location, and event type.
- Data Validation: Checks are performed to ensure
data integrity and correctness after transformations.
- Exporting Data: The processed and filtered data is
exported to
results/ACLED_detailed.xlsx for further
analysis and reporting.
Raleigh, C., Kishi, R. & Linke, A. Political instability patterns
are obscured by conflict dataset scope conditions, sources, and coding
choices. Humanit Soc Sci Commun 10, 74 (2023). https://doi.org/10.1057/s41599-023-01559-4
From https://acleddata.com/download/35181/ : If using ACLED
data in a visual, graphic, or map of your own, please attribute the
source data clearly and prominently on the visual itself or within the
key/legend and include a link to ACLED’s website. This can be in small
print on the bottom of the image. Please note your date of data access.
These citations should be included for both standalone infographics as
well as for tables/figures within a larger report. If unable to include
a link on a static visual file, please note “acleddata.com” as the
source URL. When sharing such an image on social media, please (1) be
sure that the citation is not cut off, and (2) please tag ACLED
(Twitter; Facebook; LinkedIn).
11-MERGE VILLES INTO THEIR TERRITORIES
This section identifies and merges smaller administrative regions
(villes) into their larger surrounding territories for more accurate
data analysis. The process involves:
- Identifying Intersections: Checking which villes
should be merged by analyzing geographic intersections of territory
borders.
- Creating Plots for Manual Checking: Generating
plots of intersection geometries for manual verification.
- Defining Merges: Specifying which villes should be
merged into which territories.
- Updating Labels: Adjusting the labels for the
villes being merged.
- Merging Data: Using a custom function to merge the
data of specified villes into their corresponding territories.
- Summarizing Results: Aggregating various metrics
such as voter counts and election results for the merged territories,
and calculating additional percentages and turnout rates.